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SUMMARY. In this paper, we analyze a two-level latent variable model for longitudinal data 
from the National Growth of Health Study where surrogate outcomes or biomarkers and covariates 
are subject to missingness at any of the levels. A conventional method for efficient handling of 
missing data is to reexpress the desired model as a joint distribution of variables, including the 
biomarkers, that are subject to missingness conditional on all of the covariates that are completely 
observed, and estimate the joint model by maximum likelihood, which is then transformed to the 
desired model. The joint model, however, identifies more parameters than desired, in general. We 
show that the over-identified joint model produces biased estimation of the latent variable model, 
and describe how to impose constraints on the joint model so that it has a one-to-one correspon- 
dence with the desired model for unbiased estimation. The constrained joint model handles miss- 
ing data efficiently under the assumption of ignorable missing data and is estimated by a modified 
application of the expectation-maximization (EM) algorithm. 
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data; Latent variable; the EM algorithm 


1. Introduction 

The National Heart, Lung, and Blood Institute initiated the Growth and Health Study (NGHS) to 
investigate ethnic disparities in dietary, family, psychosocial and physical activity factors of obesity 
about 2,379 girls in 1985. It collected data on development of obesity and factors associated with 
the development from 1,213 African-American and 1,166 white girls. The study followed the 
subjects from 1987-1988 when they were 9 to 10 years old until 1996-1997 when they were 18 to 
19 years old. The subjects were assessed on development of obesity and related factors annually 
[1]. 

We consider multiple biomarkers of obesity: body mass index (BMI), sum of skinfolds at 
triceps, subscapular, and suprailiac sites (Skinfold), maximum below-waist circumference (Waist), 
and percent body fat by bioelectrical impedance analysis (PercentFat). Many investigators have 
identified the risk factors of child obesity using one of these biomarkers as an outcome variable 
[2-6]. Although useful, some of these biomarkers do not differentiate the fat mass from body mass 
while others are measured with error. For example, BMI, the ratio of body weight in kilograms 
to height in meters squared, is widely used to define obesity (BMI> 30) for men and women. 
Consequently, it is a broadly analyzed outcome variable as a surrogate body fat. However, it 
cannot distinguish muscle mass from body adiposity, in particular, for children and adolescents 
[7-9]. 

Our analysis aims to quantify child obesity via multiple biomarkers and study its risk factors 
simultaneously. Specifically, we want to control for ethnic and social disparities in the growth 
of obesity, and ask how environmental factors such as TV watching and mother’s BMI influence 
the development of child obesity. Because obesity is not directly observable, NGHS collected 
the four biomarkers of obesity. We formulate a latent-variable model (LVM) of simultaneous 
equations where biomarkers, given the latent obesity, are independent in a measurement model, and 
the obesity is regressed on covariates in a structural model [10-19]. Given completely observed 
covariates and biomarkers having ignorable missing data [20], the LVM may be estimated by 


maximum likelihood (ML) via standard LVM software such as Amos [21], EQS [22], and Mplus 


[23]. 

This paper focuses on a longitudinal multilevel model where occasions at level 1 are nested 
within individuals at level 2 and where missing data are present at both levels under the assump- 
tion of ignorable missing data [20, 24]. Roy and Lin [25] estimated a longitudinal LVM given 
nonignorable dropouts and level-1 covariates missing not at random by ML. Das et al. [26] es- 
timated a structural equation model by a Markov Chain Monte Carlo method where continuous 
responses and covariates at level 1 may be missing at random in the measurement model. Both 
approaches handle level-1 outcomes and covariates subject to missingness. 

Recent advances enable efficient handling of missing data in a hierarchical linear model (HLM) 
by ML [27-30] or by Bayesian approaches [27, 31-33]. Shin and Raudenbush [28] formulated 
a univariate HLM as a joint normal distribution of variables, including the outcome, subject to 
missingness conditional on completely observed covariates. The authors estimated the joint model 
by ML via the EM algorithm [34], and then transformed the estimated joint model to the HLM. 
They showed that the unconstrained joint model, in general, over-identifies the HLM and that the 
over-identified HLM leads to biased inferences. Therefore, the authors estimated a constrained 
joint model to just identify the HLM for unbiased estimation. The method, however, cannot be 
used for the complicated LVM of simultaneous equations. In this paper, we extend the method to 
the LVM where multiple biomarkers and covariates are subject to missingness at any of the levels. 

We analyze the LVM given biomarkers and covariates that are subject to missingness with 
a general missing pattern at any of the levels. A conventional method for efficient handling of 
the missing data is to reexpress the LVM as a joint distribution of the variables, including the 
biomarkers, that are subject to missingness conditional on all of the covariates that are completely 
observed, and estimate the joint model which is then transformed to the LVM. The unconstrained 
joint model, however, identifies more parameters than desired in the LVM. Furthermore, the LVM 
is not nested within the joint model, in general. The consequence is that the over-identified joint 
model leads to biased estimation of the LVM. This paper explains how to characterize the joint 


model so that it is a one-to-one transformation of the LVM for unbiased estimation. To yield un- 


biased estimation of the LVM while handling missing data efficiently, we estimate the constrained 
joint model according to the LVM within each iteration of the EM algorithm. 

The next section introduces an LVM of our interest given incomplete data. Section 3 explains 
a joint model for efficient handling of missing data and shows how to impose proper constraints 
on the joint model for unbiased estimation of the LVM. Section 4 describes the EM algorithm for 
efficient handling of the constrained joint model. Section 5 simulates an LVM to show that the 
conventional method produces biased estimation of the LVM and that our approach corrects the 
bias. Section 6 illustrates unbiased and efficient analysis of the desired LVM given the NGHS data. 


Section 7 discusses the limitations and future extensions of our method. 


2. Latent Variable Model 


This section introduces the LVM of interest [15]. The structural model is 


where Uj, is a univariate latent obesity score, A;, is a vector of covariates having fixed effects a, 
Bix, is a vector of known covariates having level-2 unit-specific random effects b; independent of 
a level-1 unit-specific random error e€;,, and level-1 unit or occasion k is nested within level-2 unit 
or subject 2 fork = 1,--- ,k; andi = 1,--- ,n, and D is a positive definite matrix. This model 
cannot be directly estimated due to unobservable U;,. However, Uj, is related to biomarkers by a 


measurement model 
Riu =o + NU in + a: + Cn, (2) 


where Fj, is a vector of J biomarkers, yo = [Yo1 Yo2° +: Yo ale is a vector of J intercepts, 7; = 
[yur W129 V4 rile is a vector of the J effects or factor loadings of Uj, and subject-specific ran- 
dom effects a; “ N(0, ©7_,€;) are independent of level-1 random errors ej, “ N(O, 7175) 
for a diagonal matrix @7_,v~2 = diag(w1,W2,--- , Wy) with diagonal elements or submatrices 


(w1, We,:-+ ,W,z) and all other elements equal to zero. To make parameters identifiable in the 


model (1), we assume that var(e;,)=1 and that A;, does not contain an intercept. Note that the 
jth and 7’th biomarkers of subject i at occasion & are correlated and their covariance is equal to 
Vay Viz var Vix). 

Our goal is to identify the obesity factors A; and B;, and explain their associations with the 
obesity Uj, by efficient analysis of the LVM, that is, by analyzing all available sample data with- 
out dropping any observations. The challenge is to efficiently handle missing data in (Riz, Ajx), 
which is explained in the next section. In this paper, we refer to the associations as the “effects” 


of the factors, but do not mean causality. Such use of the term “effects” is pervasive in the literature. 


3. Missing Data 

To handle missing data in R;, and Aj, efficiently, we reparameterize the LVM in terms of a joint 
distribution of the response variables R;, and all covariates subject to missingness in Aj, condi- 
tional on all covariates completely observed. Because Aj; may have covariates subject to missing- 
ness as well as covariates completely observed, we decompose Aj, = [$4 Yo) Wit, W3:|" where 
pi-vector S;, and p2-vector Y2; are level-1 and -2 covariates subject to missingness, respectively, 
and p3-vector Wi; and p4-vector W 2; are level-1 and -2 covariates completely observed, respec- 
tively. Then, the joint model is a multivariate distribution of level-1 Yi, = [R4, 92]" and level-2 
Y2; that are subject to missingness conditional on W,;,, W2; and 5; that are completely observed. 
In this section, we explain that this joint model over-identifies the LVM, in general. The conse- 
quence is biased estimation of the LVM as will be illustrated in Section 5. For a positive integer 


m, let I;, and 1,, denote an m-by-m identity matrix and a vector of m unities, respectively. 


3.1. Over-identification Problem 

If we were able to observe U;,, we would directly analyze the structural model (1) without involv- 
ing the measurement model (2). To analyze all observed data in the model (1), we would estimate 
the multivariate distribution of (Uix, Six, Yo) given completely observed (Wiz, Wx, Bix). In this 


simple case, we are able to not only reveal the over-identification problem explicitly, but also ex- 


plain how to correct the problem clearly. In the following subsection, we extend the multivariate 
distribution to efficient handling of missing data in (Y1;,, Yo.) conditional on (W1;z, Wx, Bix) for 
the general LVM. 

If Ui, were observed, efficient handling of the missing data in the desired model (1) might be 


achieved, without the measurement model (2), by 


U; a 5 Be 0 0 Dai Cuik 
lik 

Si = | Bei Bsa em Oe ules, 30 rag) ena ra (3) 
2i 

Yo; 0 Bae 0 Q Lag boi 0 


where 37, and £,, are 1-by-p3 and p,-by-p3 matrices of the fixed effects of W1;, on Uj, and Siz, 


respectively, eae Bs2, and 3o2 are 1-by-p4, pi-by-p, and p2-by-p, matrices of the fixed effects of 
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Wo; on Ui, Six and Yo;, respectively, and |b,,| ~ N|0,|/7,, Ty, Teo is independent of 


bo; Tou Ts T2 


~N 10, . We center level-1 Sj, and W1,, around respective sample means 


Esik ay es 


and level-2 Y3; and W; around respective weighted sample means 


> in Equation 


(3), except for 5;, that is centered around its group mean for precise estimation of the variance 
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ee and pS aes 
matrix [35]. The centering ensures that we identify the model (1) with no intercept and model 

; ae es Wri 
(2). Shin and Raudenbush [28] expressed [67 Gio] 


Wa; 
I,, ® WE) Bs and 822W2; = (Ip, ® WE) Bo, and efficiently estimated the model (3) by ML via 
Pi utk p2 21 


= Bb Wuir, BsaiWiuk + Bsx2Wai = 


the EM algorithm where U;;, was observable. 

Although the conditional model (1) expresses a single effect of each covariate in Sj, on U;x, 
the multivariate model (3) expresses a distinct covariance at each level between the covariate and 
obesity to identify p, extraneous parameters than desired in the model (1). The two distinct co- 
variances identify the within-child association between the time-varying covariate and outcome 


that may be different from the between-child association, the association between the child-mean 


covariate and outcome. The associations identify a contextual effect of the covariate that is defined 
as the difference between the between- and within-child associations [35, 36]. Controlling for the 
within-child association, the contextual effect explains the expected difference in obesity between 
two children who have the same value of the covariate at an occasion, but who differ by one unit 
in their child-mean covariates. Consequently, the multivariate model identifies a contextual effects 
model where each covariate in 5; has a contextual effect, controlling for the within-child effect 
of the covariate [29]. Because the model (1) expresses no contextual effect of the covariate, im- 
plying identical between- and within-child associations between the covariate and outcome [36], 
the multivariate model (3) over-identifies the model (1) and expresses the single effect of each 
covariate in Sj; as a weighted average of the two associations [30, 35]. The weighted average is 
different from the single effect when model (1) is directly estimated [35, 36]. The consequence 
is that the desired model (1) is not nested within or congenial to the multivariate model (3) [37]. 
The over-identified model (3) yields biased estimation of the desired model (1) unless constrains 
are imposed on the model (3) [28]. We illustrate the over-identification problem causing biased 
estimation by a simulation study in Section 5. 

In order to correct the bias, we impose p; constraints on the model (3) so that it represents 
a one-to-one transformation of the LVM. For clarity, we describe the constraints for a random- 
intercept model (1) having 5;, = 1. Appendix A explains the constraints for a random- coefficient 

Dale Langs 


bo;) = . Given Yo;, we constrain 
Tsul2 Tys|2 


the covariances between U;;, and each covariate in S;;, to equal, i.e. 


model (1). To simplify the notation, let cov(bxi, bs, 


On = pel. Sa (4) 


ss|2 88) 


which says that the association between U;;, and each of the level-1 covariates is the same at each 
level given Y2;. The constraints imply cov(Ujx,, Six| Yi) [var (Six|Yoi)|-+ = (Lusj2 + Mus) (Lesj2 + 


Yes) = OF for Tysjo = af Tys)2 and U,,, = a] Uss, and the one-to-one transformations between 


the LVM and the multivariate model (3) 


a, = Do Se a2 = Tos Tia — T,01), 
T 

a3 = Bur — B,,O1, (5) 
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3.2. Efficient Handling of Missing Data 

Because U;,, is unobservable, we need to estimate the measurement model (2) in addition to the 
desired model (1). Because observed biomarkers are also subject to missingness, the multivariate 
model (3) cannot handle the missing data in both A;, and R;,. Instead, we formulate the joint 
distribution of (R;,.S;, Y2;) subject to missingness given completely observed covariates for R; = 


[Ri Ri,--- Ri, |’ and S; = [Sf S5---S%.]" based on the aggregated models (2) and (3) 


FG 1p, ® Yo + (WuiBu + Bibui + €ui) @ V1 1p, ® a; eC; 
S.|7= WiBs + (1k; @ Ip, )bs; + €si ae 0 Pe 20e |). 06) 
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len es Cir, | >We = [Ip @ Waser Lp, © Warr ++ Lp, @ Wrikil” > €si = [€sin €si2** Hen , and 


Xo; = I,, ® WJ. To derive estimators, we reexpress model (6) parsimoniously as 


yoy Xi 0 By Zyi ~O bi; Ej ay + C1 
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and T22 = To. Note that the joint model (7) enables us to analyze a subject who has at least a 
single value observed in (Y1;, Y2;) for efficient analysis of the LVM. 

To efficiently handle missing data, let O,; and O2; be matrices of the observed value indicators 
(1 if observed, 0 otherwise) in Y,; and Y3;, respectively, such that they extract all observed data 
Y% = O1,X1; and Y3; = Oz; Yo; from Y;; and Y2;, respectively [28]. The model (7) for the observed 


data is 


Yt; = Xj; 0 Py ie Zi; O bi ry ay, + et; + 4; (8) 


for X7; = On. X1y, X39; = OuX2i, 27; = OuZ, @; = Ovidiu, |; = Overs, and ef; = Over. We 
reexpress the model (8) parsimoniously as Y,° ~ N (2, V°) for Y° = [Ye? Ys" ]", 
5s |B: 5 | anit Zt7 + Oulvar(en) + var(ars) + var(e OT, 27,7123; 


Me = ) V; ma »(9) 
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4. Estimation via the EM Algorithm 

This section sketches efficient estimation of the joint model (7) by a modified application of the EM 
algorithm [34]. See Appendices B, C and D for details. The modification is due to the fact that we 
efficiently estimate the LVM to find the constraints (4) that will be imposed on the joint model (7) 
within each iteration of the EM algorithm. We view (Y1;, Yo, Ui, bu, bs:, @;) as complete data and 
Y,° observed within unit ¢ for U; = [Ui Ui2- - - Uix,|. The constraints (4) require that the parameters 
a of the LVM be estimated. Within each iteration of the EM algorithm, we estimate the parameters 
a and translate them into the parameters of the joint model (7) according to the transformations (5). 


To estimate a, let A; = [Aj Aiz- ++ Aig,|”, €: = [En €:2 °° em, |*, 1; = oj mil", Ve = | Vaal 
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Vik = leaaiie Esikl > ej; = [Eu Esi| for the LVM, bi; = [Bu bal? b; = [Di; boil”, By a [Bu Bal 
Lei Dus Tu Ti Tie paaen pa Wai 0 
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and >), = Th. — ToT Ce 12 for the joint model. The complete data ML estimators in iteration 
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structural model (1) and 
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for the joint model (7). At the E step, we obtain conditional expectations, E(A;,A},|Y,;°), E(Aineix 
V2), Bio Ye), EUl¥?), EURIY?), BUineinsl¥P), Elengl¥?), Eleegl¥e), B@Gl¥?). 
Elen ein l¥o), (OY), E(bibi"|¥°), and E(e7,|Y,°) from the distribution of Y1;, Ya:, Ui, ei, 4; 
by, a;|Y,°. Let V(A) denote a vector of distinct elements in a variance-covariance matrix A. At 
convergence, the Fisher information matrix is obtained from the observed log-likelihood of pa- 
rameters (7,71, 07, 82,7, Tun, V(Tss), V (Tos), V(T22),€, V(Sss), 1, 2). The variance matrix 
associated with the parameter estimates in the constrained joint model (7) is produced by invert- 
ing the Fisher information matrix. We obtain the point estimates and the standard errors associated 


with the parameters of the LVM by the invariance property of MLEs and multivariate delta method, 


respectively. 
The next two sections illustrate the method by analyses of simulated and NGHS data. The 
convergence is taken to be the difference in the observed log-likelihoods between two consecutive 


iterations less than 10~°. 


5. Simulation 

In this section, we simulate a simple LVM involving two biomarkers (J = 2), a level-1 covariate 
Siz, and a level-2 covariate W2;. The goal is to show that given W4;, the over-identified joint model 
(7) of (Rix, Six) leads to biased estimation of the LVM and that the constrained joint model (7), 
according to equations (4), corrects the bias. Next, we simulate ignorable missing data to show 
that our method via the constrained joint model estimates the desired LVM well given incomplete 


data. 


5.1. Over-identification Problem 


Five occasions (k; = 5) are nested within each of 1000 subjects (n = 1000) in the simulated LVM 
Og = Six + Wai + bi + €ix, bi ~ N(O, 1), & ~ N(0,1), 


(11) 
Rip = 1g + 12U in + ai + Ciz, a: ~ N(0,0.2512), ein~N (0, 0.25J2), 


where a2 = a3 = 0,0; = a4 = D= Yor = Yoo = Yu = YR = L 1 = = & = 2 = 0.25, 


Six ~ N(0,1), and W2; ~ Bernoulli (0.5). We simulate the model with no missing data because 
the corresponding unconstrained joint model (7) identifies more parameters than desired to yield 
biased estimation of the LVM regardless of whether there are missing data or not. Given the 
simulated data, we estimate the LVM (11) by three different ML methods via the EM algorithm: 
direct estimation of the LVM given complete data; estimation of the corresponding constrained 
joint model (7), according to Equations (4), which is then transformed to the LVM; and estimation 
of the unconstrained joint model that is transformed to the LVM. We call the three approaches 
benchmark, just-identified and over-identified estimation methods. An estimation method works 


well if it produces all point estimates close to the benchmark counterparts. Note that we do not 
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simulate missing data because the complete data analysis illustrates the over-identification problem 
and the consequential biased estimation. 

Table 1 displays the results. The benchmark estimates are shown under column heading 
“Benchmark”. All point estimates are close to their true values. The standard errors are very small. 
The just-identified LVM estimates and their standard errors in the next column under heading “Just- 
identified” are identical to the benchmark counterparts. The last column under “Over-identified”’ 
shows over-identified LVM estimates. It is apparent that all point estimates of the model (1) and 
their standard errors are comparatively underestimated while the effects of Uj, and their standard 


errors in the model (2) appear overestimated relative to the benchmark counterparts. 


5.2. Missing Data 

To compare the performance of the just-identified and over-identified estimations given incomplete 
data, we simulate ignorable missing values (Rjx, Sx) in the simulated data set of Table 1. Let Mrix 
be 1 if R;, is missing, and 0 otherwise. We define M/5;; for S;;, likewise, and draw missing values 
according to 


logit(p;) = 1+ Wa; + 6:, 6; ~ N(0,1) 


for the W. simulated completely observed so that 


Mrir ~ binomial (k;, p;), if logit(p;) > ti 


Msgix ~ bnomial(k;, 1 — p;), if logit(p;) < —t2 


We set thresholds t; = 2.09 and tz = 0.91 which are equal to the 70” and 30” percentiles of 
logit(p;), respectively. Consequently, we drop 28.14% and 13.14% of Ri, and Six, respectively. 
Note that the parameters of LVM (11) are distinct from those of the missing data mechanism above. 
Then, the missing values are missing at random or ignorable because the missing data mechanism 
depends on completely observed covariate W2; [20]. 

The estimated LVMs appear in Table 2 under the same column headings as those of Table 1. 


Both just-identified and over-identified points estimates are close to their complete-data counter- 


ll 


parts in Table 1. Due to the missing values, however, the standard errors are inflated relative to 
their complete-data counterparts, in general. Therefore, the just-identified LVM estimates appear 


unbiased under the simulated missing rate. 


6. Analysis of NGHS Data 

Now, we estimate a just-identified LVM to analyze the NGHS data. Each subject in the study 
was scheduled to visit a clinic for measurement once a year, but a number of subjects had item- 
nonresponse, or missed their visits to produce unit-nonresponse. We analyze all these subjects, in- 
cluding those having unit-nonresponse, in the joint model (7) as they have at least person-specific 
characteristics observed to strengthen the inferences at level 2 [29]. Table 3 summarizes the lon- 
gitudinal data for analysis where level-1 variables are time-varying while level-2 variables are 
individual-level or base-line characteristics. The biomarkers have high correlations ranging from 
0.81 to 0.92 as shown in Table 7. We reason that the high positive correlations result because they 
are the biomarkers of obesity. The previous studies identified influential covariates of the biomark- 
ers as age (Age), race ethnicity (Race), single-parent family (OneParent), maturation categorizing 
prepuberty, puberty, post-menarche, and > 2 years after post-menarche (Maturation), maximum 
parental education categorizing high school or less, and some college or more (ParentEd), house- 
hold yearly income (Income, categorizing < $19,999, $20, 000 — $39, 999, and > $40, 000), the 
weekly number of hours of TV watching (TV), overall physical activity pattern score (PhysicalAct, 
the higher, the more physically active), and mother’s BMI (MotherBMI). Maturation and Income 
are coded as 0, 1, 2, 3 and 0, 1, 2, respectively. Our preliminary analysis shows that the linear 
associations between the coded covariates and obesity are reasonable. Specifically, we took the 
first principal component of the biomarkers as the obesity outcome, explaining 91.4% of the total 
variability in the biomarkers. Figure | draws the obesity outcome against the coded covariates, 
revealing that the linear associations are reasonable. We analyze dummy indicator variables for 
white students (White), single-parent family (OneParent), and and the maximum parent education 


of some college or more (ParentEd). Except for Age, White, OneParent and ParentEd, nine other 
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variables miss up to 32% of their values. 

We use all available data to efficiently analyze a random-intercept LVM and a random-coefficient 
LVM. The random intercept LVM has R;,=[BMI Skinfold PercentFat Waist]”, S;,=[Maturation 
TV PhysicalAct]’, Y5;=[MotherBMI Income]’, W;,,=[Age Age? Age x White], W2; = [Par- 
entEd White OneParent]”, and B;, = 1, while the random-coefficient model has every compo- 


Doo Doi 
nent the same as the random-intercept counterpart except for Bj,=[1 Age]? and D = ; 
Dip Du 


The estimated structural and measurement models of the random-intercept LVM appear in 
Tables 4 and 5, respectively. From the fitted structural model under column-heading “MAR” in 
Table 4, TV, Maturation, MotherBMI, Age, and OneParent are positively associated while Physi- 
calAct, quadratic Age, Age-by- White interaction and White are negatively associated with obesity, 
ceteris paribus. Controlling for other covariates, Income and ParentEd are not statistically signifi- 
cant, unlike previous studies [38-40]. The estimated measurement model in Table 5 shows that all 
biomarkers are highly significant and, thus, predictive of the latent obesity. 

The estimated random-coefficient LVM is also displayed in Tables 4 and 5. The last column 
of Table 4 under column heading “MAR” shows the estimated structural model. The statistical 
inferences on all fixed effects stay the same as they are in the random-intercept LVM. However, 
the effects of linear and quadratic Age, Age-by-White interaction and White strengthen, compared 
to the random-intercept counterparts. In particular, the negative gap of white girls’ obesity relative 
to black girls’ triples. Besides, the variance of the random intercept in the random-coefficient 
LVM doubles from that of the random-intercept LVM. The measurement model in Table 5 shows 
that the obesity has attenuating effect on biomarkers, comparatively with the random-intercept 
counterparts. The likelihood ratio test for the null hypothesis Hp : Do, = Dj, = 0 produces the 
p-value< 0.01. Although the p-value is conservative [41-43], the small p-value reveals evidence 
that the effect of age varies randomly across individuals. To confirm the evidence, we compute the 
Akaike’s Information Criterion (AIC) for the random-coefficient model AIC,=498,993.00 and the 
AIC for the random-intercept model AIC2=507,572.40. The AAIC= 8579.4 > 10, which is the 
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difference between AIC, and AIC, also indicates that age has a random effect on the child obesity 
[44]. 

Figure 2 displays the effects of age for black and white girls based on the random-coefficient 
LVM. Adjusting for the effects of other covariates in the model, Age is positively associated with 
obesity [3, 45, 46]. However, we find that the positive association weakens more rapidly for white 
girls than for black girls toward the later stage of adolescence, thereby widening the racial gap in 
obesity between the two subpopulations of girls. The gap starts widening rapidly from about age 
14 where a 95% confidence interval for obesity is (0.05, 0.59). 

Table 4 compares the complete-case analysis under column heading “MCAR” with our missing 
data analysis under “MAR” of the random-intercept and -coefficient structural models. We dropped 
57.22% of occasions and 37.16% of subjects for the MCAR analyses. The estimated random in- 
tercept model under MCAR reveals that the effects of Maturation and Income are comparatively 
over-represented while the effect of Age-by-White interaction is relatively under-estimated. Fur- 
thermore, the statistical inferences of the complete-case analysis are relatively biased. The effect 
of Income is statistically significant under MCAR, but insignificant in our missing data analysis 
while the effects of quadratic Age, Age-by- White interaction, White and OneParent are statistically 
insignificant in the complete-case analysis, but significant under MAR. The biased inferences re- 
sult mainly because the standard errors of the complete-case analysis are up to 142.22% more 
inflated than the MAR counterparts. For analysis of the random-coefficient model, the complete- 
case analysis over-represents the effects of Maturation and Income , but under-represents those of 
Age-by-White interaction and White, relative to the MAR counterparts. The Age-by-White in- 
teraction effect is statistically insignificant under MCAR, but significant under MAR. The biased 
inference is due to the MCAR standard error that is 240% as large as the MAR counterpart. 

Table 6 shows the complete-case analyses of the measurement models. The effects of obesity 
on biomarkers and their standard errors are comparatively overestimated. Overall, the complete- 


case analyses appear comparatively biased and inefficient. 
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7. Discussion 

In this paper, we presented a maximum likelihood method for unbiased estimation of a latent 
variable model of simultaneous equations where biomarkers are related to latent obesity in a mea- 
surement equation and the latent obesity is regressed on covariates in a structural equation. Both 
covariates and biomarkers may be subject to missingness with a general missing pattern at any level 
of the hierarchy. The method handles missing data efficiently under an assumption of ignorable 
missing data. To handle missing data efficiently, we reexpressed the LVM as a joint distribution of 
the variables, including the biomarkers, subject to missingness conditional on completely observed 
covariates. The joint model, however, over-identifies the desired LVM when level-1 covariates are 
subject to missingness. The consequence is that the over-identified LVM may produce considerably 
biased inferences as was illustrated in Section 5. To overcome the problem of over-identification, 
we constrained the joint model to be a one-to-one transformation of the LVM, efficiently estimated 
the constrained joint model by ML via the EM algorithm and, then, transformed the estimated joint 
model to the LVM for unbiased and efficient estimation. We simulated an LVM to show that the 
just-identified LVM estimates are unbiased while the over-identified LVM counterparts are biased. 

We wrote a SAS IML program to estimate a constrained (and unconstrained) joint model, 
which was then transformed to the desired LVM via the one-to-one transformation formulas (5). 
The convergence criterion was the difference in observed log likelihoods between two-consecutive 
iterations, which was taken to less than 107°. 

An alternative approach to our efficient ML estimation of LVM (1) given incomplete data is via 
multiple imputation (MI) [47]. Given the estimated joint model (7), we may randomly draw MI of 
completed data for subsequent analysis of the LVM [28, 30]. The MI may include the latent obesity. 
Existing statistical software packages cannot impute the level-1 and -2 missing data efficiently 
according to the joint model (7) to the best knowledge of the authors. Therefore, researchers may 
be tempted to use MI of missing values using standard imputation software packages such as SAS 
PROC MI and NORM [48], followed by complete-data analysis given the imputation by standard 


LVM software [47]. When MI of single-level data is applied to multilevel data, the variance- 
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covariance structure of the imputed data sets will not accurately represent the multilevel process 
(7) that generated the data, nor will the structural relations at each level be captured correctly. 
The resulting inferences may be substantially biased [48]. If MI is applied correctly according to 
the data-generating process (7), subsequent complete-data analysis of the LVM given the MI will 
produce estimation of the LVM comparable to the estimated LVM by our method. Both of our 
ML method and the MI approach require efficient estimation of the joint model (7). Following 
the estimation, our method requires technical transformation of the joint model to the LVM by the 
multivariate Delta method while the MI approach includes the cumbersome extra step of drawing 
MI for subsequent complete-data analysis of the LVM [30]. However, once generation of MI is 
automated, the MI approach will be less technical and, thus, broadly accessible to a wide range of 
researchers. We would like to take on this research in near future. 

A limitation of the current approach is our assumption that the covariate having a random ef- 
fect is completely observed. When such a covariate has missing values, it should be modeled on 
the left-hand side of the joint model in order to handle missing data efficiently. At the same time, 
the covariate appears on the right-hand side of the joint model for estimation of the random ef- 
fect. Such a joint model is non-normal so that normal factorization of the joint model that leads to 
the desired LVM as a conditional distribution of biomarkers given covariates does not apply. One 
possible solution is a Bayesian approach where parameters are assumed to have their prior distri- 
butions, and the missing data are imputed from their posterior distributions given the parameters. 
Although the relaxed assumption will make our method more applicable, it is beyond the scope of 
the current research. 

Another limitation of our current approach is the multivariate normal joint model to handle 
missing data efficiently. We analyzed discrete covariates, household income and maturation stage, 
subject to missingness. Although it is not appropriate to handle such discrete missing values under 
the joint normality, the identified model is the desired LVM we want to analyze [14, 16, 19]. The 
advantage is that we analyze the covariates subject to missingness by the efficient missing data 


method [11, 28, 29, 49]. Robust handling of a mixture of discrete and continuous missing data is 
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in our future research agenda. 

Finally, we assumed the independence of biomarkers given obesity in the measurement model. 
To see how plausible the assumption is for each LVM, we computed the correlations between 
biomarkers implied by each fitted LVM and compared them to the corresponding sample correla- 
tions. Table 7 reveals that the random-intercept LVM explains 54 to 89% of the sample correlations 
while the random-coefficient LVM does 62 to 93%. The random-coefficient LVM explains high 87 
to 93% of the sample correlations in three pairs involving waist circumferences while it explains 
comparatively low 62 to 72% of the sample correlations between other three pairs of the biomark- 
ers. Although the random-coefficinet LVM does a better job of explaining the sample correlations 
than the random-intercept LVM, it can be further improved, in particular, for the biomarker pairs 
that do not involve the waist circumference by relaxing the independence assumption. Another 
way is to consider a more elaborate structural model having autoregressive random effects of the 
latent child obesity as the obesity is likely to be correlated between occasions within a person [50, 


51]. 
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Appendix A 
Transformation Formular Deviation 


It is easily to derive that the responses in models (1) and (3) are distributed as 
Use| Sin, Yor ~ N (arin, Vie), [Vin Sh, Yor |’ ~ N (tain, Vain), 
where 


Lak = Sho. + Yea2 + Wii,03+Wieos, Vix = BLDBu +1, 


BE Wu Tv BW: BLT a Bap = ae Bis + Digs BLT 
H2ik = BsiWiik aby Bs2Wo, ’ Voix = Len Dak a Sti Des + parr Ts2 
BoW 9; TouBix Ts To 


Let us express model (3) such that it recognizes the latent random effect b,, of Six, as 


Vir (Six 7 bee by yl i N (Usie; V3ir) 


with 
BE Wik rh B2,Wo; Be Tiekhd + Da aad Bole BAT 5 
BsiWiik T Bs2Wo, aT Yhas 0 0 
L3ik = » Vin = 
0 TsuPik 0 Des T'32 
Bo2W a; Tou Bix 0 Tos To9 


Then, a regression of U;, on the other variables leads to 
Uik| Sik = ee ee Yor ~ N (Maik Vik) 
where 


pain = (BET uspTeah — LusEae boi + SEEae Lou + YET! (Tra — TasTghToue) Bit 


ss|2 
+ Wrin (Bus az sap au) + Wi (B12 ae Bool 55 (Tou 7 Los pale ul2) Bi «— PREG Dou) ’ 
Vaik = Suu — eS a Sey Bi ( wul2 _ Tusj2T gain ou)2) Bir 
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Model (c) implies model (a) if b,; = 0. Model (c) with b,; = 0, however, has too strong assumption 
that S;, does not vary across level-2 unit. The violation of the assumption leads to substantially 


biased inferences. Alternatively, model (c) implies model (a) if 


OP SB, enl sh and 2 a, oy (d) 


ss|2 


In the following, we discuss constraints and transformation formulas for two cases: Bj, = 1 and 
BE = [1 ae | with ps covariates Xq;, having random coefficients in model (1). If By, = 1, then 


one-to-one transformation formulas between models (a) and (c) are 


-1 -1 T 
A, = U5, Usu, @2 = Tp (Tou — TrsQ1), 3 = Bur — C4; 
T T T T T 
a4 = Bug — 85.01 — Ba9Q2, D = Tyy — Ag T2202 — 20; Ts202 — 0; T3501, (e) 


h hg ca 
1= Dy — ay DigeOh fie =, Tis + Qs Tog: 


Lagu Tupus Tugs 


If Be = [1 wal: then let b,,; = [Duos GE alt (a bs pe ive eee 
freee fee 0 
and Ty. = [Tj\. 0|". Note that we assume cov(by,,;,0si) = COV(bu,i, b2:) = 0. Non-zero covari- 


ances can be estimated, but they introduce extraneous terms and make interpretable difficulty. Let 


= ab T2902 + 207 T2012 + atT,5Q1 0 . 
T= . Then the one-to-one transformation formulas for 


0 0 
Q2, D, and T),,, are 


a2 = Te: (Tuo —) T25Q1), D= Fy _ f, thahee = OTe + as Ts, (f) 
and the others keep same as these in (e). 
Appendix B 


Parameter Estimation 


The maximum likelihood estimators (MLE) of the complete data derived from their likelihood 
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LOR, U;, Si, Yai, bu. Ds:, by) are 


2 k 

a (k (e- * - 
iP =r (Syvawz) LY view, 
i=1 k=1 i=l k=l 


gr) — pre “ae @ (WE, on) yee @ (Wie), 


n Oe 
BB: ee Ti @ (Wai Wo; ») eae ® Wai) (bai — Tn Thy'bii) , (8) 


3 


i=l i=1 
wer, ~ These 
bj a a Se == 3 ee 
i=1 die ai ki 1 k=1 
n ky 
Y= = : ze SSS cui tins T=- aes 
i=l i=1 k=1 
n ky 
6h) = GY 4 (S294 ye ) ee D=- De ie 
i=1 k=1 i=1 k=1 


Given 4 and D fora random-intercept model (1), we update the estimators ae ids ie ias Bua, Buos 
es and Ty. in model (7) via formulas (e). Given @ and D for a random-coefficient model (1), we 
update the estimators, ie Sie 7 bare Bus Buos T. uo2> and T. uos 1 model (7) via formulas (f) and set 
Le Hae =U; 


At E-step, we estimate the following conditional expectations. 


Vir = BaWrr + Big2Wai + Au(V2)"(¥P — 17), 

BUA SU De BT Bet ig (Ve) AL. 

Eing = Aer(Ve) "(YP — ui), E(eiesl¥P) = Clay + 7) — Mer (Vi2)*AZ,, 

Gig = Aa(VE) (YP — uf), ElaRl¥°) = ai, + & — Aa(VP) 7 AY, (h) 
B(e¥2) = Mes(Ve) (VP — 17), Em = Ac(VE) (VP — H2), 

bf = A,(Ve) "(Ve — wf), E(ororT|¥?) = bre? + T — AVP)“ AR, 


v7 


E(Uixeinj|¥ 2) = Uineing — Au(Ve) "AZ, E(eanetigl¥2) = Gin€i, + U — Ae(VO)7 TAZ, 


where A, = [Au Au2 BFTu2] OF, Aer = [01x(@—1 745-1) 7} O1x(I=9) O1x((i6;—k) Jtpr kita) |OF> 
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AG i [01x (5-1); 1s Oretr-aesemaeplOEs Ne = OF; Me = 


E28 Ye OY, 1p) Daas 0 


Yeon eo? Are ye -0 
Of, and Ay = | (T,,BP)@ 77 12 @Ty, Tz | Of for Au = 


yok! wa, APe yy. 0 
(eB yO 1} @ Tb, Tho 


(BUT aes + [01x (&—1) pa 01x (e;—-k)]) er, Ai ire ies & (BEL a) = [01 (e—1)p1 aus Orton 
and A, is a vector with the k’” element equal to 1 and zero otherwise, 


In addition, we calculate E(A;,A4|Y,°), E(Ainesn|Y2), and E(b,b7 |Y,°) in the LVM. 


E(SinSh\¥o) E(SYS|¥2) SuWh, SinWd 

B(YaSRIY?) E(YaYSI¥2) YoWE,  YoaWs 
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Wai€ix 

E(bjb7 Yo) = 667 + D — Age(Ve) AT, (k) 


where Siz = BeiWiix + Bs2Wai + As(Vie)(¥2 — 12), E(Sx.52/¥°) = Sin Sd, ae gh agg 
A.(V2)1AP, Yor = BoxaWoit Ay(Ve)- (Ve — ue), EB (WaYar V2) = Yoo¥S +To2—Ay(Ve)1AF, 
E(SiY¥S|V 2) = Sie¥$+Ts2-As(Ve)1 AT, and &, = Acc(VP)-(¥2—p9) for Ay = [Agr Aso 17 
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Calculation of the Information Matrix 
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The information matrix is obtained by differentiating twice the observed marginal multivariate 
normal log-likelihood with mean and covariance given in (9), but we introduce new parameters 
@, and a2, which are defined in (5). Consequently, parameters 1,,;, Ty2, Tus, and “i, are the 


functions of a, @2 and the other elements in © and J’ as 


Ds = Oe Da Tue = as Too a at T.o, (1) 
Las = as a + as Ths; ite =1+ abd ssQ1. 


Let W(A) denote a vector by horizontally arranging the elements in the matrix A and y = 
(Yo, V1, 8**) in which B** = [67 W(6.1)? W(8s2)? W(822)7|?. The arrangement makes us easily 
extract the covariances between W (3.1), W(Gs2), W (S22) and a1, a2 to estimate the variances of 
a3, a4 and D by multivariate Delta method. let H; = O;03_, Hi; with Hi) = [lp, @ 17 (WuiGu) © Ly], 


Hig |e @ Igy Wie Oly, “Woe S Ty ty; and His =". Ty Worl FG =O; OF_4 rar 


fy 
with Fj, = [ly, @1l7 Wu ® yj], Fie = Hig, and Fi3 = Hi3, G; = HA; ; 
O(s+psp1 +pap2)x J 

07x : 

M, = 4H; Ty , and Q; = F; Pe EIR Ee . The expected information ma- 
I 53-+pspitpap2 
0 (psp1+pap2) x J 
trix for the MLE of y = (70, 1, 3**) is 
GT(V2) "Gi: GE (Ve) M; Ge) 
Iyy = DL |MP(Ve)G, A+MI(Ve)1M, MP(Ve)"Q, (m) 


i=1 


PVE) UG: QVM OF(VP)"10:, 


where A has its (j, k)th component $tr (Ve) 35 (Vey 3a). 
Define V (A) a vector by vertically arranging the distinct elements of the matrix A. Let 6 = 


(E, ss Dias VT a) V (Tos), V (To), V (Sas), 1, 2) = (61, do, mated , Om) for M = 35 and M = 36 
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in arandom-intercept model (1) and in a random-coefficient model (1), respectively. Then 
oe ov? 
Is,6.4 = tr ( (V; Ve)yt= 
6; Bik 2 (wo i 5, | i ) | ’ 


aa eer her 
tou = 13500 use), 
a) 


i=1 


(n) 


ave . : Sooaitd : 
and J5,, = 0, [5g = 0, where ap,, 1S an element-wise derivative with respect to 6,, form = 


1 Ds 


Appendix D 

The variance calculation of the parameters in the LVM 

The variances of the estimators a1, @2, 59, 0, € and 7 in the LVM are estimated in Appendix C. Let 
0, = (Bi, W(Bs1)* of], 02 = [Bly W(Bs2)" W(Bo2)" af ag |”, and 63 = [Tun V(Tss)” V(Ts2)” 
V(To2)? at ad)”. From the transformation formulas (5) and Delta method, the covariances of d3, 


(4, and D with By, = 1 are estimated as 
7 geet Pee , we Pa i a a nF 
covas = Vf,covaiVf,, covag = V facovOeV f,, covD = V f3covd3V f 3 (0) 


where cov, can be extracted from the inverse of the fisher information matrix in Appendix C, 
Vfi= [Z P3 — aj ® Ips = Br | ’ Vio = [I Pa (af ® Ts) = (az ® Ip,) _ i _ Boo], and 


ep_\* (_ap_\* (_ap_\* (ap \" (ap \* aD T_ oT 
V fs = c (af) (as) (afc) (22) (22) | for OV (Tss)) =O7 VG eb 


OD = TOT s2 OD _ T  OT22 OD _ OD __ 
OV (Tsai 204 OV (Peay 2? OVE, SC 2 OV (Tan 2? Ba. —2T 202 — 215501, and 55> = 
—2T 2X2 = 2T5,Q1. 

Note that a ; ae and 5 Tos); are unknown. We know for any p-by-p matrix @ , the 


first derivative of the (1, k)th (k > element is 


da, bn0e + 5,0F k>l 
= (p) 
5,02 k=l, 


OW 1kI 
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and for any p-by-q (p 4 q) matrix wz the first derivative of the (/, /)th element is 


O 
Se a Se ay (q) 
OW 2x1 


where 6), and 7, are p-by-1 and q-by-1 vectors with the h“” element equal to one and zero otherwise, 
respectively. After we vertically arrange the distinct elements in c, and qa, the first derivative of 
the 7“ element for 7 = 1,2,--- ,p(p+1)/2 or j = 1,2,--- , pq has a one-to-one transformation 
with equations (p) and (q), respectively. Similarly, the variances of distinct elements in D could be 


estimated for a random-coefficient model (1). 
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Table 1: Estimation of the simulated LVM (11) by three different estimation methods 


Model Para. True value Estimate (S.E.*) 
Benchmark Just-identified | Over-identified 
(1) a 1 1.031 (0.075) — 1.032 (0.075) 0.901 (0.065) 
1 1.007 (0.024) —_ 1.007 (0.024) 0.882 (0.021) 
D 1 0.999 (0.069) 0.999 (0.069) 0.751 (0.052) 
(2) Yo 1 0.993 (0.054) 0.993 (0.054) 0.993 (0.054) 
1 1.026 (0.055) — 1.026 (0.055) 1.026 (0.054) 
v4 1 0.987 (0.014) 0.987 (0.014) 1.129 (0.016) 
1 0.987 (0.014) 0.987 (0.014) 1.129 (0.016) 
& 0.25 0.268 (0.035) 0.268 (0.035) 0.267 (0.035) 
0.25 0.291 (0.035) 0.291 (0.036) 0.291 (0.036) 
iz 0.25 0.240 (0.018) 0.240 (0.018) 0.240 (0.018) 
0.25 0.258 (0.019) 0.258 (0.019) 0.258 (0.018) 


“standard error 


Table 2: Estimation of the simulated LVM (11) 


given ignorable missing data 


Model Para. True value Estimate (S.E.%) 

Just-identified Over-identified 

(1) ay 1 0.994 (0.033) 0.878 (0.031) 
1 1.061 (0.081) 0.932 (0.058) 

D 1 1.010 (0.082) 0.763 (0.040) 

(2) Yo 1 0.987 (0.056) 0.987 (0.044) 
1 1.013 (0.056) — 1.013 (0.045) 

v1 1 0.984 (0.019) 1.120 (0.033) 

1 0.981 (0.019) 1.117 (0.033) 

E 0.25 0.248 (0.025) 0.248 (0.031) 

0.25 0.317 (0.041) 0.318 (0.033) 

T 0.25 0.249 (0.025) 0.249 (0.018) 

0.25 0.260 (0.025) 0.260 (0.018) 


@ standard error 


Table 3: NGHS data for analysis 


level variable description mean (S.E.) missing (%) 
BMI BMI(kg/m?) 22.42 (5.81) 308 (1.5) 
Skinfold sum of skinfolds (mm) 45.11 (24.88) 783 (3.8) 
Waist max. below-waist circumference (cm) 93.95 (12.87) 2807 (13.5) 
level 1 PercentFat percent fat by BIA 25.29 (11.49) 1694 (8.1) 
Age age in years at time of visit 14.36 (2.99) 0 (0.0) 
TV TV watching (hours/week) 31.35 (21.32) 4834 (23.2) 
PhysicalAct physical activity pattern score 17.35 (17.75) 6573 (31.5) 
Maturation maturation stage 2.10 (1.03) 1063 (5.1) 
MotherBMI mother’s BMI 27.35 (6.91) 6772 (32.4) 
ParentEd “maximum parental education 0.75 (0.43) 0 (0.0) 
level 2 Income household income 1.06 (0.83) 1156 (5.5) 
White “race (white/black) 0.48 (0.50) 0 (0.0) 
OneParents “single-parent family 0.31 (0.46) 0 (0.0) 


1 if some college or more, 0 otherwise 


0 if < $20k, 2 if > $40k, 1 otherwise 
1 if white, 0 if black 


1 if single parent family, 0 if two-parent family 


Table 4: Estimated model (1) 


Para. Covariate Estimate(S.E.) 
Random intercept Random coefficient 

MCAR MAR MCAR MAR 
Qy TV 0.005 * (0.001) 0.004? (0.001) 0.006? (0.001) —0.004* (0.001) 
PhysicalAct -0.004* (0.001) -0.003* (0.001) -0.002' (0.001) -0.002' (0.001) 
Maturation 0.504* (0.043) 0.347? (0.021) 0.6067 (0.051) 0.387* (0.024) 
a) MotherBMI 0.149? (0.012) —_0.150# (0.011) 0.156? (0.015) —0.133* (0.013) 
Income -0.285' (0.127) — -0.183 (0.096) -0.176 (0.158) 0.078 (0.114) 
Q3 AGE 0.454 * (0.047) 0.502* (0.020) 0.647* (0.057) — 0.713* (0.024) 
AGE? -0.025 (0.013)  -0.025* (0.005) -0.039' (0.015) -0.031* (0.005) 
AGEx White -0.032 (0.063)  -0.057' (0.026) -0.081 (0.079) -0.124* (0.033) 
O4 ParentEd 0.041 (0.219) = 0.012 (0.155) 0.094 (0.237) — 0.144 (0.179) 
White -0.330 (0.176) -0.309' (0.137) -0.694 * (0.225) -0.938+ (0.186) 
OneParent 0.357 (0.217) 0.3801 (0.159) 0.610' (0.271) 0.568* (0.185) 
Doo 8.735 (0.575) 8.040 (0.386) 16.935 (0.802) 16.482 (0.560) 
Doi 0.818 (0.060) 0.942 (0.043) 
Diy 0.154 (0.009) 0.155 (0.006) 


* Doo = D ina random-intercept model (1) 
: p-value< 0.05, + p-value< 0.01 


Table 5: Estimated model (2) given incomplete data 


Model (2) with Biomarker Boj Bry 4; & 

random intercept BMI 22.74 (0.09) 1.46(0.01) 1.06(0.02) 1.07 (0.09) 
Skinfold 47.30 (0.37) 5.24 (0.04) 75.05 (0.83) 73.38 (2.73) 
Waist 93.46 (0.29) 4.37(0.03) 2.02 (0.10) 29.72 (1.16) 
PercentFat 25.88 (0.19) 2.69(0.02) 15.29 (0.18) 21.00 (0.75) 

random coefficient BMI 22.74 (0.09) 1.08 (0.01) 0.54(0.01) 0.86 (0.08) 
Skinfold 47.37 (0.38) 3.95 (0.03) 65.06 (0.73) 69.98 (2.56) 
Waist 93.49 (0.28) 3.04(0.02) 6.04(0.11) 24.79 (0.97) 
PercentFat 25.85 (0.19) 1.95(0.02) 15.40 (0.18) 21.88 (0.76) 


Table 6: Complete-case analysis of model (2) 


Model (2) with Biomarker Bos Br; 7; &; 

random intercept BMI 22.94 (0.12) 1.74(0.02) 0.80(0.02) 1.12 (0.10) 
Skinfold 46.98 (0.48) 6.41 (0.07) 70.74 (1.25) 58.82 (2.88) 
Waist 95.02 (0.33) 4.52 (0.04) 3.55 (0.15) 18.22 (0.97) 
PercentFat 26.52 (0.25) 3.15 (0.04) 14.58 (0.26) 19.96 (0.93) 

random coefficient BMI 23.01 (0.12) 1.21(0.01) 0.44(0.02) 0.93 (0.09) 
Skinfold 47.25 (0.49) 4.46 (0.06) 66.98 (1.18) 52.75 (2.77) 
Waist 95.20 (0.32) 3.05 (0.04) 5.51 (0.15) 17.88 (0.93) 


PercentFat 26.63 (0.25) 2.13 (0.03) 15.74 (0.28) 20.54 (0.94) 


Table 7: Correlations and % sample correlations explained by LVM 


Correlation Sample Model % of correlation 


between correlation correlation explained by 


RIM RCM RIM RCM 
BMI and Skinfold 0.89 0.48 0.56 53.50 62.27 
BMI and WC 0.92 0.77 0.83 83.44 89.63 
BMI and PBF 0.82 0.53 0.59 64.71 72.43 
Skinfold and WC 0.81 0.65 0.71 79.84 87.39 
Skinfold and PBF 0.81 0.45 0.51 D029 63.07 
WC and PBF 0.81 0.72 0.76 88.73 93.39 


RIM: random-intercept model; RCM: random-coefficient model; WC: 
waist circumference; PBF: percent body fat 


Obesity Score 


Latent obesity score 


Figure 1: Scatter Plots of Obesity Score against 
Houshold Income and Maturation Stages 
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Figure 2: Obesity growth curves for blacks and whites 
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